US 6,263307 Bl 

1 2 

ADAPTIVE WEINER FILTERING USING transmission rate may be onlv 2.4 Kbps rather than the 64 

LINE SPECTRAL FREQUENCIES Kbps of PCM. In practice, the LPC coefficients must be 

quantized for transmission, and the sensitivity of the filter 

CROSS-REFERENCE TO RELATED behavior to the quantization error has led to quantization 
APPLICATIONS 5 based on the Line Spectral Frequencies (LSF) representa- 

Cofiled patent applications with Ser. Nos. 08/424,928, ll ° n " 

08/425,125, 08/426,746, and 08/426,427 are copending and To im P rove the sound quality, further information may be 

disclose related subject matter. These applications all have a extracted from the speech, compressed and transmitted or 

common assignee. stored along with the LPC coefficients, pitch, voicing, and 

10 gain. For example, the codebook excitation linear prediction 

BACKGROUND OF THE INVENTION (CELP) method first analyzes a speech frame to find the LPC 

. ■ . filter coefficients, and then filters the frame with the LPC 

Hie invention relates to electronic devices, and, more filter. Next, CELP determines a pitch period from the filtered 

particularly, to speech analysis and synthesis devices and frame and removes this periodicity with a comb filter to 
systems. 15 yield a DO ise-looking excitation signal. Lastly, CELP 

Human speech consists of a stream of acoustic signals encodes the excitation signals using a codebook. Thus CELP 

with frequencies ranging up to roughly 20 KHz; but the band transmits the LPC filter coefficients, pitch, gain, and the 

of 100 Hz to 5 KHz contains the bulk of the acoustic energy. codebook index of the excitation signal. 

Telephone transmission of human speech originally con- The advent of digital cellular telephones has emphasized 
SLSted of conversion of the analog acoustic signal stream into * the role of noise suppression in speech processing, both 

an analog electrical voltage signal stream (e.g., microphone) coding and recognition. Customer expectation of high per- 

for transmission and reconversion to an acoustic signal f ormance even in extreme car noise situations plus the 

stream (e.g., loudspeaker) for reception. demaDd l0 move t0 progressively lower data rate speech 

The advantages of digital electrical signal transmission coding in order to accommodate the ever-increasing number 

led to a conversion from analog to digital telephone trans- of cellular telephone customers have contributed to the 

mission beginning in the 1960s. Typically, digital telephone importance of noise suppression. While higher data rate 

signals arise from sampling analog signals at 8 KHz and speech coding methods tend to maintain robust performance 

nonlinearly quantizing the samples with 8-bit codes accord- even in high noise environments, that typically is not the 

ing to the ,u-law (pulse code modulation, or PCM). A clocked case with lower data rale speech coding methods. The 

digital-to-analog converter and companding amplifier recon- ~ speech quality of low data rate methods tends to degrade 

struct an analog electrical signal stream from the stream of drastically with high additive noise. Noise supression to 

8-bit samples. Such signals require transmission rates of 64 prevent such speech quality losses is important, but it must 

Kbps (kilobits per second). Many communications be achieved without introducing any undesirable artifacts or 

applications, such as digital cellular telehone, cannot handle speech distortions or any significant loss of speech inielli- 

such a high transmission rate, and this has inspired various 35 gibility. These performance goals for noise suppression have 

speech compression methods. existed for many years, and they have recently come to the 

The storage of speech information in analog format (e.g., forefront due to digital cellular telephone application, 

on magnetic tape in a telephone answering machine) can FIG. la schematically illustrates an overall system 100 of 

likewise be replaced with digital storage. However, the 40 modules for speech acquisition, noise suppression, analysis, 

memory demands can become overwhelming: 10 minutes of transmission/storage, synthesis, and playback. A micro- 

8-bit PCM sampled at 8 KHz would require about 5 MB phone converts sound 'waves into electrical signals, and 

(megabytes) of storage. This demands speech compression sampling ana log- to -digital converter 102 typically samples 

analogous to digital transmission compression. at 8 KHz to cover the speech spectrum up to 4 KHz. System 

One approach to speech compression models the physi- 45 100 mav partition the stream of samples into frames with 

ological generation of speech and thereby reduces the nec- smooth windowing to avoid discontinuities. Noise suppres- 

essary information transmitted or stored. In particular, the sion 104 filters a frame to suppress noise, and analyzer 106 

linear speech production model presumes excitation of a extracts LPC coefficients, pitch, voicing, and gain from the 

variable filter (which roughly represents the vocal tract) by noise -suppressed frame for transmission and/or storage 108. 

either a pulse train for voiced sounds or white noise for 50 ^ ne transmission may be any type used for digital, informa- 

unvoiced sounds followed by amplification or gain to adjust tion transmission, and the storage may likewise be any type 

the loudness. The model produces a stream of sounds simply use d to store digital information. Of course, types of encod- 

by periodically making a voiced/unvoiced decision plus in g analysis other than LPC could be used. Synthesizer 110 

adjusting the filter coefficients and the gain. Generally, see combines the LPC coefficients, pitch, voicing, and gain 

Markel and Gray, Linear Prediction of Speech (Springer- 55 information to synthesize frames of sampled speech which 

Verlag 1976). * digital-to-analog convenor (DAC) 112 converts to analog 

More particularly, the linear prediction method partitions signals to drive a loudspeaker or other playback device to 

a stream of speech samples s(n) into "frames" of, for regenerate sound waves. 

example, 180 successive samples (22.5 msec intervals for a FIG. lb shows an analogous system 150 for voice rec- 

8 KHz sampling rate); and the samples in a frame then 60 ognition with noise suppression. The recognition analyzer 

provide the data for computing the filter coefficients for use may simply compare input frames with frames from a 

in coding and synthesis of the sound associated with the database or may analyze the input frames and compare 

frame. Each frame generates coded bits for the linear pre- parameters with known sets of parameters. Matches found 

diction filter coefficients (LPC), the pitch, the voiced/ between input frames and stored information provides rec- 

unvoiced decision, and the gain. This approach of encoding 65 ognition output. 

only the model parameters represents far fewer bits than One approach to noise suppression in speech employs 

encoding the entire frame of speech samples directly, so the spectral subtraction and appears in Boll, Suppression of 
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Acoustic Noise in Speech Using Spectral Subtraction, 27 
IEEE Tr.ASSP 113 (1979), and Lim and Oppenheim, 
Enhancement and Bandwidth Compression of Noisy 
Speech, 67 Proc.IEEE 1586 (1979). Spectral subtraction 
proceeds roughly as follows. Presume a sampled speech 5 
signal s(j) with uncorrected additive noise n(j) to yield an 
observed windowed noisy speech y(j)=s(j)+n(j). These are 
random processes over time. Noise is assumed to be a 
stationary process in that the process's autocorrelation 
depends only on the difference of the variables; that is, there 10 
is a function r^.) such that: 



where E is the expectation: The Fourier transform of the 
autocorrelation is called the power spectral density, P^co). 
If speech were also a stationary process with autocorrelation 
r/j) and power spectral density P^(co), then the power 
spectral densities would add due to the lack of correlation: 

P^-P^P^to) 

Hence, an estimate for P 5 (w), and thus s(j), could be 
obtained from the observed noisy speech y(j) and the noise 
observed during intervals of (presumed) silence in the 
observed noisy speech. In particular, take Py((o) as the 
squared magnitude of the Fourier transform of y(j) and 
P jV (u)) as the squared magnitude of the Fourier transform of 
the observed noise. 

Of course, speech is not a stationary process, so Lim and 
Oppenheim modified the approach as follows. Take s(j) not 
to represent a random process but rather to represent a 
windowed speech signal (that is, a speech signal which has 
been multiplied by a window function), n(j) a windowed 
noise signal, and y(j) the resultant windowed observed noisy 
speech signal. Then Fourier transforming and multiplying 
by complex conjugates yields: 

For ensemble averages the last term on the righthand side of 
the equation equals zero due to the lack of correlation of 
noise with the speech signal. This equation thus yields an 
estimate, S A (a>), for the speech signal Fourier transform as: 

\S\utf-\m\ 2 -E{\!*W 2 } 

This resembles the preceding equation for the addition of 
power spectral densities. 

An autocorrelation approach for the windowed speech 
and noise signals simplifies the mathematics. In particular, 
the autocorrelation for the speech signal is given by 

r 5 0W(<W+/) f 

with similar expressions for the autocorrelation for the noisy 
speech and the noise. Thus the noisy speech autocorrelation 



The power spectral density P^w) of the noise signal can 
be estimated by detection during noise -only periods, so the 
speech power spectral estimate becomes 

which is the spectral subtraction. 

The spectral subtraction method can be interpreted as a 
time-varying linear filter H(cd) so that S A (co)«H(a))Y(aj) 
which the foregoing estimate then defines as: 

The ultimate estimate for the frame of windowed speech, 
s A (j), then equals the inverse Fourier transform of S A (w), and 

35 then combining the estimates from successive frames 
("overlap add") yields the estimated speech stream. 

This spectral subtraction can attenuate noise substantially, 
but it has problems including the introduction of fluctuating 
tonal noises commonly referred to as musical noises. 

20 The Lim and Oppenheim article also describes an alter- 
native noise suppression approach using noncausal Wiener 
filtering which minimizes the mean-square error. That is, 
again S~(co)-H(co)Y(a)) but with H(a)) now given by: 

25 H^-p^MPsW+p^)] 

This Wiener filter generalizes to: 

^ where constants a and p are called the noise suppression 
factor and the filter power, respectively. Indeed, a=l and 
P-VS leads to the spectral subtraction method in the follow- 
ing. 

A noncausal Wiener filter cannot be directly applied to 
provide an estimate for s(j) because speech is not stationary 
and the power spectral density P^(co) is not known. Thus 
approximate the noncausal Wiener filter by an adaptive 
generalized Wiener filter which uses the squared magnitude 
of the estimate S'(co) in place of P^w): 

40 //(o)-(|5-((0)| 2 /[|5»P + o£{ [A'(<o)| 2 }])< J 

Recalling S*((i>)=H(a>)Y(u)) and then solving for |S A (w)| in 
the p-Vi case yields: 



45 



|r(a»HW»)P-af{M»)P)l li 



ry<J>rs<J>rA<f)+Cs»<j)+Css(-j) 

where c 5/ /.) is the cross correlation of s(j) and n(j). But the 
speech and noise signals should be uncorrelated, so the cross 
correlations can be approximated as 0. Hence, r y (j)=r i <j)+ 
r iV (j). And the Fourier transforms of the autocorrelations are 
just the power spectral densities, so 

Of course, Py(a)) equals |Y(co)| 2 with Y(co) the Fourier 
transform of y(j) due to the autocorrelation being just a 
convolution with a time-reversed variable. 



which just replicates the spectral subtraction method when 
a-1. 

However, this generalized Wiener filtering has problems 
including how to estimate S A , and estimators usually apply 

50 an iterative approach with perhaps a half dozen iterations 
which increases computational complexity. 

Ephraim, A Minimum Mean Square Error Approach for 
Speech Enhancement, Conf.Proc. ICASSP 829 (1990), 
derived a Wiener filter by first analyzing noisy speech to find 

55 linear prediction coefficients (LPQ and then resynthesizing 
an estimate of the speech to use in the Wiener filter. 

In contrast, O'Shaughnessy, Speech Enhancement Using 
Vector Quantization and a Form ant Distance Measure, Con- 
f.Proc. ICASSP 549 (1988), computed noisy speech for- 

60 mams and selected quantized speech codewords to represent 
the speech based on formant distance; the speech was 
resynthesized from the codewords. This has problems 
including degradation for high signal-to-noise signals 
because of the speech quality limitations of the LPC syn- 

65 thesis. 

The Fourier transforms of the windowed sampled speech 
signals in systems 100 and 150 can be computed in either 



US 6,263307 Bl 
5 6 

fixed point or floating point format. Fixed point is cheaper frame buffer 212 holds the filtered output for speech 
to implement in hardware but has less dynamic range for a analysis, such as LPC coding, recognition, or direct trans- 
comparable number of bits. Automatic gain control limits mission. The filter coefficients in block 208 derive from 
the dynamic range of the speech samples by adjusting estimates for the noise spectrum and the noisy speech 
magnitudes according to a moving average of the preceding 5 spectrum of the frame, and thus adapt to the changing input, 
sample magnitudes, but this also destroys the distinction All of the noise suppression computations may be performed 
between loud and quiet speech. Further, the acoustic energy with a standard digital signal processor such as a 
may be concentrated in a narrow frequency band and the TMS320C25, which can also perform the subsequent speech 
Fourier transform will have large dynamic range even for analysis, if any. Also, general purpose microprocessors or 
speech samples with relatively constant magnitude. To com- 10 specialized hardware could be used, 
pensate for such overflow potential in fixed point format, a ^ ferred embodiment noise suppr e S sion filters may 
few bits may be reserved for large Fourier transform also be rcalizcd withom Fourier ^ howevcr> ^ 
dynamic range; but this implies a loss of resolution for small mu i tiplication of Fourier lraQsforms then corresponds to 
magnitude samples and consequent degradation of quiet convolution of functions 
speech. This is especially true for systems which follow a 15 « 

Fourier transform with an inverse Fourier transform. 7116 P referred embodiment noise suppression filters may 

each be used as the noise suppression blocks in the generic 

SUMMARY OF THE INVENTION systems of FIGS, la-b to yield preferred embodiment 

systems. 

The present invention provides speech noise suppression ^ ... . , . , , 
by spectral subtraction filtering improved with filter 20 Th \ smoothed subtraction preferred embodi- 
clamping, limiting, and/or smoothing, plus generalized ZZ^V v P T »*™" filler wh ' ch « clam P s 
Wiener filtering with a signal-to-noise ratio dependent noise atteDuatl0 * t0 ^ suppression for inputs with small signal- 
suppression factor, and plus a generalized Wiener filter °^ ( ) "'u™* ™" 6 estimale J t0 avoid mter 
based on a speech estimate derived from codebook noisy fl "Tr J£ (3 > * m ? ^ SP&Ci ™ 
speech analysis and resynthesis. And each frame of samples 25 US f d f ° r f Uer defimUon, and (4) updates a noise spectrum 
has a frame-energy-based scaling applied prior to and after cslunalc f ™ the P recediD g frame USl °S * e ™*V speech 
Fourier analysis To preserve quiet speech resolution. Sp f "P" The a « enuall0 ° clam P ™y depend upon speech 
™ . . , - . , , and noise estunates in order to lessen the attenuation (and 
The invention has advantages including simple speech distortion) for speech; this strategy may depend upon esti- 
noise suppression. ^ mates only in a relalively noise.free frequency band. FIG. 3 

BRIEF DESCRIPTION OF THE DRAWINGS * * fl ° W dia S ram showing all four aspects for the generation 

- * ^ 00 of the noise suppression filter of block 208. 

The drawings are schematic for clarity. The signal-to-noise ratio adaptive generalized Wiener 

FIGS, la-b show speech systems with noise suppression. filter preferred embodiments use H((o)=[P/(co)/[P/(o))+ 

FIG. 2 illustrates a preferred embodiment noise supprcs- 35 °^n(^)}T where the noise suppression factor a depends on 

sion subsystem. Ey/E^ with E^ the noise energy and E y the noisy speech 

FIGS. 3-5 are flow diagrams for preferred embodiment cnergy for lhe frame - P referred embodiments also use 
noise suppression. a LPC spectral approximation of the noisy speech for 

pt^ x „ a- f f • i- r 1 a smoot hed speech power spectrum estimate as illustrated in 

emSment ^ 8 P 40 ,he flow %«» ™- 4 - F ' G 4 als ° i"^s an optional 

filtered a. 

FIGS. 7-8 illustrate spectral subtraction preferred ^ . , , . , , „ 7 . 

embodiment aspects. P ™ e codebook-based generahzed Wiener filter noise sup- 
er/**' c ft l u i L , pression preferred embodiments use H(w)=[P/(w)/fP/(w)+ 
9a-fc shows spectral subtraction preferred embodi- a?^)]f with P/(co) estimated from LSFs as weighted 

ment systems. 45 sums of LSFs in a codebook of LSFs with the weights 

FIGS. lOa-b illustrates spectral subtraction preferred determined by the LSFs of the input noisy speech. Then 

embodiments with adaptive minimum gain clamping. iterate: use this H(u>) to form H(o))Y(co), next redetermine 

FIG. 11 is a block diagram of a modified Wiener filter tne m P u t LSFs from H(co)Y(co), and then redetermine H(co) 

preferred embodiment system. with these LSFs as weights for the codebook LSFs. A half 

FIG. 12 shows a codebook based generalized Wiener filter 50 dozen iterat ions may be used. FIG. 5 illustrates the flow. 

preferred embodiment system. The power estimates used in the preferred embodiment 

FIG. 13 illustrates a preferred embodiment internal pre- filter definitions may also be used for adaptive scaling of low 

cision control system. power signals to avoid loss of precision during FFT or other 

operations. The scaling factor adapts to each frame so that 

DESCRIPTION OF THE PREFERRED with fixed-point digital computations the scale expands or 

EMBODIMENTS contracts the samples to provide a constant overflow 

Overview headroom, and after the computations the inverse scale 

FIG. 2 shows a preferred embodiment noise suppression reslorcs the frame P ower leveL na 6 illus ^ a ^s the flow. 

filter system 200. In particular, frame buffer 202 partitions 60 ^ scaling applies without regard to automatic gain control 

an incoming stream of speech samples into overlapping and muld even bc used m con j unct -on with an automatic 

frames of 256-sample size and windows the frames; FFT gain controIled m P ut - 

module 204 converts the frames to the frequency domain by Smoothed spectral subtraction preferred embodiments 

fast Fourier transform; multiplier 206 pointwise multiplies FIG. 3 illustrates as a flow diagram the various aspects of 

the frame by the filter coefficients generated in noise filter 65 the spectral subtraction preferred embodiments as used to 

block 208; and IFFT module 210 converts back to the time generate the filter. A preliminary consideration of the stan- 

domain by inverse fast Fourier transform. Noise suppressed dard spectral subtraction noise suppression simplifies expla- 
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nation of the preferred embodiments. Thus first consider the noise increase factor appears as a shift in the logarithmic 

standard spectral subtraction filter: input signal-to-noise power ratio independent variable of 

FIG. 7. Of course, the 2 factor could be replaced by other 

«(^^|^-M<o)| W <o)F-i-Ma,)P,1^f &CMMS sucfa as j 5 0f 3 . indeedi pjQ 7 sbows , 5 dB noise 

A graph of this function with logarithmic scales appears in 5 increase factor with the resulting attenuation curve labelled 

FIG. 7 labelled "standard spectral subtraction". Indeed, "noise increased". Further, the factor could vary with fre- 

spectral subtraction consists of applying a frequency- quency such as more noise increase (i.e., more attenuation) 

dependent attenuation to each frequency in the noisy speech at low frequencies 

power spectrum with the attenuation tracking the input { ^ ^ ^ 

signal-to-noise power ratio at each frequency. That is, H(u^ )' J 

represents a linear lime-varving filter. Consequentlv, as noise suppression filter H(co) by smoothing over neighbor- 
shown in FIG. 7, the amount of attenuation varies rapidly i°S frequencies. That is, for an input windowed noisy speech 
with input signal-to-noise power ratio, especially when the signal y(j) with Fourier transform Y(w), apply a running 
input signal and noise are nearly equal in power. When the average over frequency so that |Y(cu)| 2 is replaced by 
input signal contains only noise, the filtering produces 15 (W*|Y| 2 Xu>) in H(o>) where W(u>) is a window about 0 and 
musical noise because the estimated input signal-to-noise + ^ the convolution operator. FIG. 8 shows that the spectral 
power ratio at each frequency fluctuates due to measurement estimates for white noise converge more closely to the 
error, producing attenuation with random variation across CQrrecl answer with increasing smoothing window size. That 
frequencies and overtime. FIG. 8 shows the probability ^ curves „ 5 dcmcnt smoothing ^ .33 element 
distribution of the FVT power ^spectral ^imate at a giv^ smoothing" show the 

frequency of white noise with umty power (labelled no . . & . , . . . . . 

smoothing"), and illustrates the amount of variation which decreasing probabilities for large variations with increasing 

can be expected. smoothing window sizes. More spectral smoothing reduces 

The preferred embodiments modify this standard spectral ^ noise fluctuations in the filtered speech signal because it 

subtraction in four independent but synergistic approaches 25 reduces the variance of spectral estimation for noisy frames; 

as detailed in the following. . however, spectral smoothing decreases the spectral resolu- 

Preliminarily, partition an input stream of noisy speech tion so that the noise suppression attenuation filter cannot 

sampled at 8 KHz into 256-sample frames with a 50% lrack sharp spe ctral characteristics. The preferred embodi- 

overlap between successive frames; that is, each frame ^ mem operates sampling at 8 KHz and windows the 

shares its first 128 samples with the preceding frame and ^ frames of size 256 ^ (32 milliseconds); thus 

shares its last 128 samples with the succeeding frame. This r . f „ __ t _ • tMT , cf „ rTT1 oc a 

. , , r c r u r u ^ an FFT on the frame generates the rouner transform as a 

yields an input stream of frames with each frame having 32 " . _, . r r 1 ^ 1 ,u 

c 1 j c u ■ 1 a ~r~~~ function on a domain of 256 frequency values. lake the 

msec of samples and a new frame beginning every 16 msec. iuu^iul. ^ j 

Next, multiply each frame with a Hann window of width smoothing window W(co) to have a width of 32 frequencies, 
256. (AHann window has the form w(k)-(l+cos(2--tk/X))/2 35 so convolution with W(oj) averages over 32 adjacent fre- 
with K+l the window width.) Thus each frame has 256 quencies. W(w) may be a simple rectangular window or any 
samples y(j), and the frames add to reconstruct the input other window. The filter transfer function with such smooth- 
speech stream. ing is: 

Fourier transform the windowed speech to find Y(w) for 

the frame; the noise spectrum estimation differs from the 40 tf(u>) 2 -i-{tf(<o)| 2 /w*|yp(a>) 
traditional methods and appears in modificaiion (4). 

(1) Clamp the H(co) attenuation curve so that the attenu- Thus a filter with all three of the foregoing features has 
ation cannot go below a minimum value; FIG. 7 has this transfer function: 

labelled as "clamped" and illustrates a 10 dB clamp. The 

clamping prevents the noise suppression filter H(co) from * //(<o) 2 =max(i(r 2 , J^[^(co)| 2 / w*|y|2( {0 )] 
fluctuating around very small gain values, and also reduces 

potential speech signal distortion. The corresponding filter Extend the definition of H(w) by symmetry to ji<oj<2* or 

would be: -^<oj<0 

//(w^-maxlio* 2 , i-|at(id)| 2 /|v(io)P] 50 (4) Any noise suppression by spectral subtraction requires 

an estimate of the noise power spectrum. Typical methods 
Of course, the 10 dB clamp could be replaced with any other updale an average noise spectrum during periods of non- 
desirable clamp level, such as 5 dB or 20 dB. Also, the h bm the performance 0 f this approach 
clamping could include a sloped clamp or stepped clamping d eslimalioQ of h inlervals which 
or other more genera clamping curves but a simple clamp 55 & m Somc ^ of acouslic noise 
lessens computational complexity. Tlie following Adaptive S peech-Uke characteristics, and if they are incor- 

' P P classified as speech, then the noise estimated will not 

input signal energv level. - , , „ f , ♦ ^ ■ 

(2) Increase the noise power spectrum estimate by a factor be u P daled frequently enough to track changes in the noise 
such as 2 so that small errors in the spectral estimates for 60 environment. 

input (noisy) signals do not result in fluctuating attenuation Consequently, the preferred embodiment takes noise as 

filters. The corresponding filter for this factor alone would any signal which is always present. At each frequency 

be: recursively estimate the noise power spectrum P;v(u>) for use 

2 9 in the filter H(a)) by updating the estimate from the previous 

N(o)) -l-4lN(u>)| /|y(u>)| ^ frame, P'^u)), using the current frame smoothed estimate for 

For small input signal-to-noise power ratios this becomes the noisy speech power spectrum, Py(u>)=<\V>|Y| 2 (tD), as 

negative, but a clamp as in (1) eliminates the problem. This follows: 
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P^(u>) = 0.978/^.(0)) if P r <0.978^,(o>) 

= P Y {(u) if 0.978 F s {u>) s /V(<y> s 1.006 /^(w) 

= l.OOe^-M if 1.006 T A .M < PyM 5 

For the first frame, just take P^o)) equal to Py(a)). 

Thus, the noise power spectrum estimate can increase up 
to 3 dB per second or decrease up to 12 dB per second. As jo 
a result, the noise estimates will only slightly increase during 
short speech segments, and will rapidly return to the correct 
value during pauses between words. The initial estimate can 
simply be taken as the first input frame which typically will 
be silence; of course, other initial estimates could be used 15 
such as a simple constant. This approach is simple to 
implement, and is robust in actual performance since it 
makes no asumptions about the characteristics of either the 
speech or the noise signals. Of course, multiplicative factors 
other than 0.978 and 1.006 could be used provided that the 20 
decrease limit exceeds the increase limit. That is, the product 
of the multiplicative factors is less than 1; e.g., (0.978) 
(1.006) is less than 1. 

A preferred embodiment filter may include one or more of 
the four modifications, and a preferred embodiment filter 25 
combining all four of the foregoing modifications will have 
a transfer function: 

//(u)) 2 =max[ 10" 2 , 1 -4/V(«)/1V* |>1 2 (g>)] 

with PjvX^) tne noise power estimate as in the preceding. 30 

FIG. 9a shows in block form preferred embodiment noise 
suppressor 900 which implements a preferred embodiment 
spectral subtraction with all four of the preferred embodi- 
ment modifications. In particular, FFT module 902 performs 
a fast Fourier transform of an input frame to give Y(u>), 35 
magnitude squarer 904 generates |Y(a))| 2 , convolver 906 
yields P^a))~W*|Y| 2 ((o), noise buffer (memory) 908 holds 
P v '(co), ALU (arithmetic logic unit plus memory) 910 com- 
pares P y and P^' and computes P A r A and updates buffer 908, 
ALU 912 computes l-4P JV *(a))/P y> clamper 914 computes 40 
H((o), multiplier 920 applies H(o) to Y(co), and I FFT 
module 922 does an inverse Fourier transform to yield the 
noise -suppression filtered frame. Controller 930 provides 
the timing and enablement signals to the various compo- 
nents. Noise suppressor 900 inserted into the systems of 45 
FIGS, la— b as the noise suppression blocks provides pre- 
ferred embodiment systems in which noise suppressor 900 
in part controls the output. 
Adaptive Filter Clamp 

The filter attenuation clamp of the preceding section can 50 
be replaced with an adaptive filter attenuatioo clamp. For 
example, take 

H(o>f-imx[M 2 , l-pV(w)| 2 /|y((o)H 

and let the minimum filter gain M depend upon the signal 55 
and noise power of the current frame (or, for computational 
simplicity, of the preceding frame). Indeed, when speech is 
present, it serves to mask low-level noise; therefore, M can 
be increased in the presence of speech without the listener 
hearing increased noise. This has the benefit of lessening the 60 
attentuation of the speech and thus causing less speech 
distortion. Because a common response to having difficulty 
communicating over the phone is to speak louder, this 
decreasing the filter attenuation with increased speech power 
will lessen distortion and improve speech quality. Simply 65 
put, the system will transmit clearer speech the louder a 
person talks. 
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In particular, let YP be the sum of the signal power 
spectrum over the frequency range 1.8 KHz to 4.0 KHz: with 
a 256-sample frame sampling at 8 KHz and 25 6 -point FFT, 
this corresponds to frequencies 51^/128 to ji. That is, 

YP-1J>y(<a) for 51 n/l 28 *<i> S» 

Similarly, let NP be the corresponding sum of the noise 
power: 

ATVZ^/yfto) for 5L-i/128^a>S* 

with P. v "(o)) the noise estimate from the preceding section. 
The frequency range 1.8 KHz to 4.0 KHz lies in a band with 
small road noise for an automobile but still with significant 
speech power, thus detect the presence of speech by con- 
sidering YP-NP. Then take M equal to A+B(YP-NP) where 
A is the minimum filter gain with an . all noise input 
(analogous to the clamp of the preceding section), and B is 
the dependence of the minimum filter gain on speech power. 
For example, A could be -8 dB or -10 dB as in the preceding 
section, and B could be in the range of V4 to 1. Further, 
YP-NP may become negative for near silent frames, so 
preserve the minimum clamp at A by ignoring the B(YP~ 
NP) factor when YP-NP is negative. Also, an upper limit of 
-4 dB for very loud frames could be imposed by replacing. 
B(YP-NP) with min[-4 dB, B(YP-NP)]. 

More explicitly, presume a 16-bit fixed-point format of 
two's complement numbers, and presume that the noisy 
speech samples have been scaled so that numbers X arising 
in the computations will fall into the range -1 ^X<+1, which 
in hexadecimal notation will be the range 8000 to 7FFF. 
Then the filter gain clamp could vary between A taken equal 
to 1000 (0.125), which is roughly -9 dB, and an upper limit 
for A+B(YP-NP) taken equal to 3000 (0375), which is 
roughly -4.4 dB. More conservatively, the clamp could be 
constrained to the range of 1800 to 2800. 

Furthermore, a simpler implementation of the adaptive 
clamp which still provides its advantages uses the M from 
the previous frame (called M OLD ) and takes M for the 
current frame simply equal to (17/16)M OLZ> when M OLD is 
less than A+B(YP-NP) and (15/16)M oz , D when M OLD is 
greater than A+B(YP-NP). 

The preceding adaptive clamp depends linearly on the 
speech power; however, other dependencies such as qua- 
dratic could also be used provided that the functional 
dependence is monotonia Indeed, memory in system and 
slow adaptation rates for M make the clamp nonlinear. 

The frequency range used to measure the signal and noise 
powers could be varied, such as 1.2 KHz to 4.0 KHz or 
another band (or bands) depending upon the noise environ- 
ment. FIG. 10a heuristically illustrates an adaptive clamp in 
a form analogous to FIG. 7; of course, the adaptive clamp 
depends upon the magnitude of the difference of the sums 
(over a band) of input and noise powers, whereas the 
independent variable in FIG. 10a is the power ratio at a 
single frequency. However, as the power ratio increases for 
"average" frequencies, the magnitude of the difference of the 
sums of input and noise powers over the band also increases, 
so the clamp ramps up as indicated in FIG. 10a for "average" 
frequencies. FIG. 10b more accurately shows the varying 
adaptive clamp levels for a single frequency: the clamp 
varies with the difference of the sums of the input and noise 
powers as illustrated by the vertical arrow. Of course, the 
clamp, whether adaptive or constant, could be used without 
the increased noise, and the lefthand portions of the clamp 
curves together with the standard spectral curve of FIGS. 
lOa-Z? would apply. 
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Note that the adaptive clamp could be taken as dependent 
upon the ratio YP/NP instead of just the difference or on 
some combination. Also, the positive slope of the adaptive 
clamp (see FIG. 10a) could be used to have a greater 
attenuation (e.g., -15 clB) for the independent variable equal 
to 0 and ramp up to an attenuation less than the constant 
clamp (which is -10 dB) for the independent variable greater 
than 3 dB. The adaptive clamp achieves both better speech 
quality and better noise attenuation than the constant clamp. 



been made inversely dependent on the signal-to-noise ratio, 
and the filter transfer function becomes: 

Optionally, average a by weighting with the a from the 
preceding frame to limit discontinuities. Further, the value of 
the constant k can be increased to obtain higher noise 
suppression, which does not result in fluctuations in the 
speech as much as it does for standard spectral subtraction 



Note that the estimates YP and NP could be defined by the 10 because H(w) is always nonnegative. 



previous frame in order to make an implementation on a 
DSP more memory efficient. For most frames the YP and NP 
will be close to those of the preceding frame. 

FIG. 9b illustrates in block form preferred embodiment 
noise suppressor 950 which includes the components of 15 
system 900 but with an adaptive damper 954 which has the 
additional inputs of YP from filter 956 and NP from filter 
960. Insertion of noise suppressor 950 into the systems of 
FIGS, la-b as the noise suppression blocks provides pre- 
ferred embodiment systems in which noise suppressor 950 20 
in part controls the output. 

Modified generalized Wiener filter preferred embodi- 
ments 

FIG. 4 is a flow diagram for a modified generalized 
Wiener filter preferred embodiment. Recall that a general- 
ized Wiener filter with power (3 equal Vi has a transfer 
function: 



//(o)) 2 -/'/((o),lP 5 -(a))-K^ A :(a))] 

with P/(a>) an estimate for the speech power spectrum, 
P v "(co) an estimate for the noise power spectrum, and a a 
noise suppression factor. The preferred embodiments 
modify the generalized Wiener filter by using an a which 
tracks the signal-to-noise power ratio of the input rather than 
just a constant. 

Heuristically, the preferred embodiment may be under- 
stood in terms of the following intuitive analysis. First, take 
P/(ca) to be cP/(co) for a constant c with P /(<*>) the power 
spectrum of the input noisy speech modelled by LPC. That 
is, the LPC model for y(j) in some sense removes the noise. 
Then solve for c by substituting this presumption into the 
statement that the speech and the noise are uncorrelated 
(P y (u))=P 5 (a))+P A <ca)) and integrating (summing) over all 
frequencies to yield: 
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where ? s 
Thus b 

energy of the noisy speech LPC model and also an estimate 
for the energy of y(j), and E iV is the energy of the noise in 
the frame. Thus, c=(Ey-E A )/E y and so P 5 Xw)=[(E y ^E^) / / E y ] 
Py(u>). Then inserting this into the definition of the gener- 
alized Wiener filter transfer function gives: 

Now take the factor multiplying P v "(a))(i.e., [Ey/(Ey-E v )] 
a) as inversely dependent upon signal-to-noise ratio (i.e., 
[^/(E^-E^Ja-KE^/E^ for a constant k) so that the noise 
suppression varies from frame to frame and is greater for 
frames with small signal-to-noise ratios. Thus the modified 
generalized Wiener filter insures stronger suppression for 
noise -only frames and weaker suppression for voiced- 
speech frames which are not noise corrupted as much. In 
short, take a=KE A /E y , so the noise suppression factor has 



50 



In more detail, the modified generalized Wiener filter 
perferred embodiment proceeds through the following steps 
as illustrated in FIG. 4: 

(1) Partition an input stream of noisy speech sampled at 
8 KHz into 25 6 -sample frames with a 50% overlap 
between successive frames; that is, each frame shares 
its first 128 samples with the preceding frame and 
shares its last 128 samples with the succeeding frame. 
This yields an input stream of frames with each frame 
having 32 msec of samples and a new frame beginning 
every 16 msec. 

(2) Multiply each frame with a Harm window of width 
256. (A Hann window has the form w(j)=(l+cos(2Ji;j/ 
N))/2 with N+l the window width.) Thus each frame 
has 256 samples y(j) and the frames add to reconstruct 
the input speech stream. 

(3) For each windowed frame, find the 8th order LPC 
filter coefficients a 0 (=1), a r a 2 , . . . a 8 by solving the 
following eight equations for eight unknowns: 

Ytatrij+k)^) for j-1,2, ... 8 

where r(.) is the autocorrelation function of y(.). 

(4) Form the discrete Fourier transform A^-Z^e - '* 40 , 
and then estimate Py(u)) for use in the generalized 
Wiener filter as Ey/|A(a))| 2 with Ey^X^^k) the energy 
of the LPC model. This just uses the LPC synthesis 
filter spectrum as a smoothed version of the noisy 
speech spectrum and prevents erratic spectral fluctua- 
tions from affecting the generalized Wiener filter. 

(5) Estimate the noise power spectrum Pjv(u)) for use in 
the generalized Wiener filter by updating the estimate 
from the previous frame, P'^co), using the current 
frame smoothed estimate for the noisy speech power 
spectrum, Py(to), as follows: 

P n {l>) = 0.978 P N (u) if P r < 0.978 P n {lj) 

= P Y {(o) if 0.97 8 P w ( to) s P Y ( <o) s. 1 .006 P N (w) 

= 1.006 T* N {uA if 1.006 P^(cj) < P Y (a>) 



Thus the noise spectrum estimate can increase at 3 dB per 
second and decrease at 12 dB per second. For the first frame, 

55 just take P^co) equal to Py(u). And E N is the integration 
(sum) of Pjy over all frequencies. 

Also, optionally, to handle abrupt increases in noise level, 
use a counter to keep track of the number of successive 
frames in which the condition P>>1.006 P' A ^co) occurs. If 75 

60 successive frames have this condition, then change the 
mutliplier from 1.006 to (1.006) 2 and restart the counter at 
0. And if the next successive 75 frames have the condition 
Py>(1.006) 2 P'^co), then change the multiplier from 
(L006) 2 to (1.006) 3 . Continue in this fashion provided 75 

65 successive frames all have satisfy the condition. Once a 
frame violates the condition, return to the initial multiplier 
of 1.006. 
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Of course, other multipliers and count limits could be 
used. 

(6) Compute a^KE^Ey to use in the generalized Wiener 
filter. Typically, k will be about 6-7 with larger values 
for increased noise suppression and smaller values for 
less. Optionally, a may be filtered by averaging with 
the preceding frame by: 

a*=max(l, 0.8a+0.2a') 

where a* is the a of the preceding frame . That is, for the 
current frame with E A . the energy of the noise estimate 
P A <co), E y the energy of the noisy speech LPC model, 
and a' is the same expression but for the previous 
frame. FIG. 4 shows this optional filtering with a 
broken line. 

(7) Compute the first approximation modified generalized 
Wiener filter for each frequency as: 



10 



20 



from 



with Py(u)) and Eyfrom step (4), P a <oj) and E v 
step (5), and a from step (6). 

(8) Clamp Hj(co) to avoid excess noise suppression by 
defining a second approximation: H 2 (ci>)=max(-10 dB, 
H j (to)). Alternatively, an adaptive clamp could be used. 

(9) Optionally, smooth the second approximation by con- 
volution with a window W(oo) having weights such as 
[0.1, 0.2, 0.4, 0.2, 0.1] to define a third approximation 
H 3 (co)=W*H 2 (w). FIG. 4 indicates this optional 
smoothing in brackets. 

(10) Extend H 2 (u>) (or H 3 (co) if used) to the range n<co<2jt 
or -jkuxO by symmetry to define H(w). The period- 
icity of H(o)) makes these extensions equivalent. 

(11) Compute the 256-point discrete Fourier transform of 
y(j) to obtain Y(u>). 

(12) Take S A (a>)-H(u))Y(u)) as an estimate for the spec- 
trum of the frame of speech with noise removed. 

(13) Compute the 256-point inverse discrete Fourier 
transform of S^to) and take the inverse transform to be 
the estimate s*(j) of speech with noise removed for the 
frame. 

(14) Add the s*(j) of the overlapping portions of succes- 
sive frames to get s(j) as the final noise suppressed 
estimate. 

FIG. 11 shows in block form preferred embodiment noise 
suppressor 1100 which implements the nonoptional func- 
tions of a modified generalized Wiener filter preferred 
embodiment. In particular, FFT module 1102 performs a fast 
Fourier transform of an input frame to give Y(.) and auto- 
correlator 1104 performs autocorrelation on the input frame 
to yield r(.). LPC coefficient analyzer 1106 derives the LPC 
coefficients a y , and ALU 1108 then forms the power estimate 
Py(.) plus the frame energy estimate E y . ALU 1110 uses Py(.) 
to update the noise power estimate P' jV held in noise buffer 
1112 to give P^ which is stored in noise buffer 1112. ALU 
1110 also generates E^, which together with E y from ALU 
1108, for ALU 1114 to find a. ALU 1116 takes the outputs 
of ALUs 1108, 1110, and 1114 to derive the first approxi- 
mation H 3 and clamper 1118 then yields H 2 to be used in 
multiplier 1120 to perform the filtering. IFFT module 1122 
performs the inverse FFT to yield the output filtered frame. 
Each component has associated buffer memory, and con- 
troller 1130 provides the timing and enablement signals to 
the various components. The adaptive clamp could be used 
for clamper 1118. 
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Insertion of noise suppressor 1100 into the systems of 
FIGS. \a-b as the noise suppression block provides pre- 
ferred embodiment systems in which noise suppressor 1100 
in part controls the output. 

Codebook based generalized Wiener filter preferred embodi- 
ment 

FIG. 5 illustrates the flow for codebook-based generalized 
Wiener filter noise suppression preferred embodiments hav- 
ing filler transfer functions: 

with a the noise suppression constant. HeuristicaUy, the 
preferred embodiments estimate the noise Pj>/(u)) in tne 
same manner as step (5) of the previously described gener- 
alized Wiener filter preferred embodiments, and estimate 
P/(u>) by the use of the line spectral frequencies (LSF) of 
the input noisy speech as weightings for LSFs from a 
codebook of noise -free speech samples. In particular, code- 
book preferred embodiments proceed as follows. 

(1) Partition an input stream of speech sampled at 8 KHz 
into 256-sample frames with a 50% overlap between 
successive frames; that is, follow the first step of the 
modified generalized Wiener filter preferred embodi- 
ments. 

(2) Multiply each frame with a Hann window of width 
256; again following the modified generalized Wiener 
filter preferred embodiment. 

(3) For each windowed frame with samples y(j), find the 
Mth (typically 8th) order LPC filter coefficients a 0 («1), 
a i» a 2> ■ • • a A/ bv solving the M linear equations for M 
unknowns: 

2^/^+0=0 /-ia . . . M 

where r(.) is the autocorrelation of y(.). This again 
follows the modified generalized Wiener filter pre- 
ferred embodiments. The gain of the LPC spectrum is 

(4) Compute the line spectral frequencies (LSF) from the 
LPC coefiBcients. That is, set P(z)-A(z)+A(l/z)z' v and 
Q(z)-A(z)-A(l/z)/z A/ where A(z)-l+ ai /z+a 2 /z 2 + . . . 
+a A/ /z M is the analysis LPC filter, and solve for the roots 
of the polynomials P(z) and Q(z). These roots all lie on 
the unit circle |z|»l and so have the form e 7 '* 0 with the 
cos being the LSFs for the noisy speech frame. Recall 
that the use of LSFs instead of LPC coefficients for 
speech coding provides better quantization error prop- 
erties. 

(5) Compute the distance of the noisy speech frame LSFs 
from each of the entries of a codebook of M -tuples of 
LSFs. That is, each codebook entry is a set of M LSFs 
in size order. The codebook has 256 of such entries 
which have been determined by conventional vector 
quantiztion training (e.g., LBG algorithm) on sets of M 
LSFs from noise-free speech samples. 

In more detail, let (LSF y>1 , LSF y 2 , LSF, 3 , . . . , LSF, 
M LSFs of the jth entry of the codebook: 
distance of the noisy speech frame LSFs, (LSF„ ml , LSF„ 2 , 
LSF„ 3 , . . . , LSF„^), from the jth entry to be: 

where LSF„ c(0 is the noisy speech frame LSF which is the 
closest to LSF„ , (so c(i) will be either i-1 or i+1 if the LSF„ 
are in size order). Thus, this distance measure is dominated 
by the LSF„ t which are close to each other, and this provides 



then take the 



US 6,263,307 Bl 



15 



16 



good results because such LSFs have a higher chance of 
being formants in the noisy speech frame. 

(6) Estimate the M LSFs (LSF, v LSP, . . . LSF JjV ) for 

the noise-free speech of the frame by a probability 

weighting of the codebook LSFs: 

LSFi-TjPjLSFj; 

where the probabilities p,- derive from the distance 
measures of the noisy speech frame LSFs from the 
codebook entries: 
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where the constant y controls the dynamic range for the 
probabilities and can be taken equal 0.002. Larger 
values of y imply increased emphasis on the weights of 
the higher probability codewords. 

(7) Convert the estimated noise-free speech LSFs to LPC 
coefficients, a,-', and compute the estimated noise -free 20 
speech power spectrum as 

where X/a,r(i) is the gain of the LPC spectrum from step 25 
(3). 

(8) Estimate the noise power spectrum P a -((d) as before: 
see step (5) of the modified generalized Wiener filter 
section. 

(9) Take a equal to 10, and form the filter transfer function 30 
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where P/(w) comes from step (7) and P^(co) from step 
(8). 

(10) Clamp H 1 (co) as in the other preferred embodiments 
to avoid filter fluctuations to obtain the final general- 
ized Wiener filter transfer function: H(to)=max(-10 dB, 
H^to)). Alternatively, an adaptive clamp could be used. 

(11) Compute the 256-point discrete Fourier transform of 
y(j) to obtain Y(w). 

(12) Take S"(a))»H(a))Y(o>) as an estimate for the spec- 
trum of the frame of speech with noise removed. 

(13) Compute the 256-point inverse fast Fourier transform 
of S A (w) to be the estimate s*(j) of speech with noise 
removed for the frame. 

(14) Iterate steps (3)-{13) six or seven times using the 
estimate s"(j) from step (13) for y(j) in step (3). FIG. 5 
shows the iteration path 

(15) Add the s~(j) of the overlapping portions of succes- 
sive frames to get s(j) as the final noise suppressed 
estimate. 

FIG. 12 shows in block form preferred embodiment noise 
suppressor 1200 which implements the codebook modified 55 
generalized Wiener filter preferred embodiment. In 
particular, FFT 1202 performs a fast Fourier transform of an 
input frame to give Y(.) and autocorrelator 1204 performs 
autocorrelation on the input frame to yield r(.). LPC coef- 
ficient analyzer 1206 derives the LPC coefficients a y , and 60 
LPC-to-LSF converter 1208 gives the LSF coefficients to 
ALU 1210. Codebook 1212 provides codebook LSF coef- 
ficients to ALU 1210 which then forms the noise-free signal 
LSF coefficient estimates to LSF-to-LPC converter 1214 for 
conversion to LPC estimates and then to ALU 1216 to form 65 
power estimate Py(.). Noise buffer 1220 and ALU 1222 
update the noise estimate as with the preceding 



preferred embodiments, and ALU 1224 uses Py(.) and Pj/() 
to form the first approximation unclapmed H 1 and clamper 
1226 then yields clamped H 1 to be used in multiplier 1230 
to perform the filtering. IFFT 1232 performs the inverse FFT 
to yield the first approximation filtered frame. Iteration 
counter send the first approximation filtered frame back to 
autocorrelator 1204 to start generation of a second approxi- 
mation ilter H 2 . This second approximation filter applied to 
Y(.) yields the second approximation filtered frame which 
iteration counter 1234 again sends back to autocorrelator 
1204 to start generation of a third approximation H 3 . Itera- 
tion counter repeals this six limes to finally yield a seventh 
approximation filter and filtered frame which then becomes 
the output filtered frame. Each component has associated 
buffer memory, and controller 1240 provides the timing and 
enablement signals to the various components. The adaptive 
clamp could be used for damper 1226. 

Insertion of noise suppressor 1200 into the systems of 
FIGS, la-b as the noise suppression blocks provides pre- 
ferred embodiment systems in which noise suppressor 1200 
in part controls the output. 

Internal precision control 

The preferred embodiments employ various operations 
such as FFT, and with low power frames the signal samples 
are small and precision may be lost in multiplications. For 
example, squaring a 16-bit fixed-point sample will yield a 
32-bit result, but memory limitations may demand that only 
16 bits be stored and so only the upper 16 bits will be chosen 
to avoid overflow. Thus an input sample with only the lowest 
9 bits nonzero will have an 18-bit answer which implies only 
the two most significant bits will be retained and thus a loss 
of precision. 

An automatic gain control to bring input samples up to a 
higher level avoids such a loss of precision but destroys the 
power level information: both loud and quiet input speech 
will have the same power output levels. Also, such auto- 
matic gain control typically relies on the sample stream and 
does not consider a frame at a time. 

A preferred embodiment precision control method pro- 
ceeds as follows. 

(1) Presume that an (N+l)-bit two's complement integer 
format for the noisy speech samples u(j) and other 
variables, and presume that the variables have been 
scaled to the range -1^X<+1. Thus for 16-bit format 
with hexadecimal notation, variables lie in the range 
from 8000 to 7Kb F. First, estimate the power for an 
input frame of 256 samples by 2u(j) 2 with the sum over 
the corresponding 256 js. 

(2) Count the number of significant bits, S, in the power 
estimate sum. Note that with |u(j)| having an average 
size of K significant bits, S will be about 2K+8. So the 
number of bits in the sum reflects the average sample 
magnitude with the maximum possible S equal 2N+8. 

(3) Pick the frame scaling factor so as to set the average 
sample size to have (2N+8-S)/2-H significant bits 
where H is an integer, such as 3, of additional head- 
room bits. That is, the frame scaling faclor is 2 (2A ^ 8 ~ 
s)n~/i. In terms of the K of step (2), the scaling factor 
equals 2 N ~ K ~ H . For example, with 16- bit format and 3 
overhead bits, if the average sample magnitude is 2~ 9 (7 
significant bits), then the scaling factor will be 2 5 so the 
average scaled sampled magnitude is 2~ 4 which leaves 
3 bits (2 3 ) before overflow occurs at 2°. 

(4) Apply the Hann window (see steps (1H 2 ) of the 
modified generalized Wiener filter section) to the frame 
by point wise multiplication. Thus with y(j) denoting 
the windowed samples, 



